首页> 外文OA文献 >Tiered Sampling: An Efficient Method for Approximate Counting Sparse Motifs in Massive Graph Streams

【2h】

Tiered Sampling: An Efficient Method for Approximate Counting Sparse Motifs in Massive Graph Streams

机译：分层抽样：一种有效的近似计数稀疏方法大规模图流中的图案

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We introduce Tiered Sampling, a novel technique for approximate countingsparse motifs in massive graphs whose edges are observed in a stream. Ourtechnique requires only a single pass on the data and uses a memory of fixedsize $M$, which can be magnitudes smaller than the number of edges. Our methods addresses the challenging task of counting sparse motifs -sub-graph patterns that have low probability to appear in a sample of $M$ edgesin the graph, which is the maximum amount of data available to the algorithmsin each step. To obtain an unbiased and low variance estimate of the count wepartition the available memory to tiers (layers) of reservoir samples. Whilethe base layer is a standard reservoir sample of edges, other layers arereservoir samples of sub-structures of the desired motif. By storing morefrequent sub-structures of the motif, we increase the probability of detectingan occurrence of the sparse motif we are counting, thus decreasing the varianceand error of the estimate. We demonstrate the advantage of our method in the specific applications ofcounting sparse 4 and 5-cliques in massive graphs. We present a completeanalytical analysis and extensive experimental results using both synthetic andreal-world data. Our results demonstrate the advantage of our method inobtaining high-quality approximations for the number of 4 and 5-cliques forlarge graphs using a very limited amount of memory, significantly outperformingthe single edge sample approach for counting sparse motifs in large scalegraphs.

机译：我们介绍了分层采样，这是一种用于在大量图形中近似计数稀疏主题的新技术，该图形在流中观察到其边缘。我们的技术只需要对数据进行一次遍历，并使用固定大小的$ M $的内存，大小可能比边的数量小。我们的方法解决了对稀疏主题进行计数的艰巨任务-子图形模式出现在图形中$ M $边缘样本中的可能性很小，这是每个步骤中算法可用的最大数据量。为了获得计数的无偏和低方差估计，我们将可用内存划分为储层样本的层（层）。虽然基础层是边缘的标准储层样本，但其他层是所需基序的子结构的储层样本。通过存储图案的更频繁的子结构，我们增加了检测正在计数的稀疏图案发生的可能性，从而减小了估计的方差和误差。我们证明了我们的方法在大规模图中对稀疏4和5奇数进行计数的特定应用中的优势。我们使用合成数据和真实数据提供了完整的分析分析和广泛的实验结果。我们的结果证明了我们的方法的优势，即使用非常有限的内存即可获得大型图的4个和5个顶点的高质量近似值，大大优于单边样本方法来计算大型图的稀疏主题。

著录项

作者
De Stefani, Lorenzo; Terolli, Erisa; Upfal, Eli;
展开▼
作者单位

展开▼
年度 2017
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Tiered Sampling: An Efficient Method for Counting Sparse Motifs in Massive Graph Streams [J] . De Stefani Lorenzo, Terolli Erisa, Upfal Eli ACM transactions on knowledge discovery from data . 2021,第5期

机译：分层采样：在大规模图形流中计算稀疏图案的有效方法
2. A comparison between approximate counting and sampling methods for frequent pattern mining on data streams [J] . Willie Ng, Manoranjan Dash Intelligent data analysis . 2010,第6期

机译：数据流中频繁模式挖掘的近似计数和采样方法的比较
3. Memory-Efficient and Accurate Sampling for Counting Local Triangles in Graph Streams: From Simple to Multigraphs [J] . Lim Yongsub, Jung Minsoo, Kang U. ACM transactions on knowledge discovery from data . 2018,第1期

机译：高效，精确的内存采样，用于统计图流中的局部三角形：从简单图到多图
4. Tiered sampling: An efficient method for approximate counting sparse motifs in massive graph streams [C] . Lorenzo De Stefani, Erisa Terolli, Eli Upfal IEEE International Conference on Big Data . 2017

机译：分层采样：一种有效的方法，用于对大量图形流中的稀疏主题进行近似计数
5. Improved Triangle Counting in Graph Streams: Neighborhood Multi-sampling [D] . Hanjani, Kiana Mousavi 2018

机译：改进的图形流中的三角形计数：邻域多重采样
6. Efficient sparse estimation on interval-censored data with approximated L0 norm: Application to child mortality [O] . Yan Chen, Yulu Zhao 2021

机译：具有近似L0规范的间隔删除数据的有效稀疏估计：应用于儿童死亡率
7. Efficient semi-streaming algorithms for local triangle counting in massive graphs [O] . Luca Becchetti, Paolo Boldi, Aristides Gionis, 2013

机译：大规模图形中局部三角形计数的高效半流算法

Tiered Sampling: An Efficient Method for Approximate Counting Sparse Motifs in Massive Graph Streams

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅